Mercer’s Theorem on General Domains: on the Interaction between Measures, Kernels, and RKHSs
نویسنده
چکیده
Given a compact metric space X and a strictly positive Borel measure ν on X , Mercer’s classical theorem states that the spectral decomposition of a positive self-adjoint integral operator Tk : L2(ν) → L2(ν) of a continuous k yields a series representation of k in terms of the eigenvalues and -functions of Tk. An immediate consequence of this representation is that k is a (reproducing) kernel and that its reproducing kernel Hilbert space can also be described by these eigenfunctions and -functions. It is well-known that Mercer’s theorem has found important applications in various branches of mathematics including probability theory and statistics. In particular for some applications in the latter areas, however, it would be highly convenient to have a form of Mercer’s theorem for more general spaces X and kernels k. Unfortunately, all extensions of Mercer’s theorem in this direction either stick too closely to the original topological structure of X and k, or replace the absolute and uniform convergence by weaker notions of convergence that are not strong enough for many statistical applications. In this work we fill this gap by establishing several Mercer type series representations for k that, on the one hand, make only very mild assumptions on X and k, and, on the other hand, provide convergence results that are strong enough for interesting applications in e.g. statistical learning theory. To illustrate the latter, we first use these series representations to describe ranges of fractional powers of Tk in terms of interpolations spaces and investigate under which conditions these interpolation spaces are contained in L∞(ν). For these two results we then discuss applications related to the analysis of so-called least squares support vector machines, which are a state-of-the-art learning algorithm. Besides these results we further use the obtained Mercer representations to show that every self-adjoint nuclear operator L2(ν)→ L2(ν) is an integral operator whose representing function k is the difference of two (reproducing) kernels.
منابع مشابه
Mercer’s Theorem for Quaternionic Kernels
the series being uniformly and absolutely convergent in (x,y). A number of generalisations to Mercer’s theorem may be found in the literature, in particular dealing with kernels K : Y × Y → C for various choices of Y . However there would appear to have been (to the best of the author’s knowledge) no attempts made to extend Mercer’s theorem to cover non-complex valued kernels. In the present pa...
متن کاملMercer's Theorem, Feature Maps, and Smoothing
We study Mercer’s theorem and feature maps for several positive definite kernels that are widely used in practice. The smoothing properties of these kernels will also be explored.
متن کاملCharacteristic Kernels on Groups and Semigroups
Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be cha...
متن کاملImproving the Performance of Text Categorization using N-gram Kernels
Kernel Methods are known for their robustness in handling large feature space and are widely used as an alternative to external feature extraction based methods in tasks such as classification and regression. This work follows the approach of using different string kernels such as n-gram kernels and gappy-n-gram kernels on text classification. It studies how kernel concatenation and feature com...
متن کاملCovariate Shift in Hilbert Space: A Solution via Surrogate Kernels
Covariate shift is an unconventional learning scenario in which training and testing data have different distributions. A general principle to solve the problem is to make the training data distribution similar to that of the test domain, such that classifiers computed on the former generalize well to the latter. Current approaches typically target on sample distributions in the input space, ho...
متن کامل